Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Spherical embedding algorithm based on Kullback-Leibler divergence and distances between nearest neighbor points
ZHANG Bianlan, LU Yonggang, ZHANG Haitao
Journal of Computer Applications    2017, 37 (3): 680-683.   DOI: 10.11772/j.issn.1001-9081.2017.03.680
Abstract664)      PDF (773KB)(420)       Save
Aiming at the problem that the existing spherical embedding algorithm cannot effectively embed the data into the low-dimensional space in the case that the distances between points far apart are inaccurate or absent, a new spherical embedding method was proposed, which can take the distances between the nearest neighbor points as input, and embeds high dimensional data of any scale onto the unit sphere, and then estimates the radius of the sphere which fit the distribution of the original data. Starting from a randomly generated spherical distribution, the Kullback-Leibler (KL) divergence was used to measure the difference of the normalized distance between each pair of neighboring points in the original space and the spherical space. Based on the difference, the objective function was constructed. Then, the stochastic gradient descent method with momentum was used to optimize the distribution of the points on the sphere until the result is stable. To test the algorithm, two types of spherical distribution data sets were simulated: which are spherical uniform distribution and Kent distribution on the unit sphere. The experimental results show that, for the uniformly distributed data, the data can be accurately embedded in the spherical space even if the number of neighbors is very small, the Root Mean Square Error (RMSE) of the embedded data distribution and the original data distribution is less than 0.00001, and the spherical radius of the estimated error is less than 0.000001; for spherical normal distribution data, the data can be embedded into the spherical space accurately when the number of neighbors is large. Therefore, in the case that the distance between points far apart are absent, the proposed method can still be quite accurate for low-dimensional data embedding, which is very helpful for the visualization of data.
Reference | Related Articles | Metrics
Stochastic nonlinear dimensionality reduction based on nearest neighbors
TIAN Shoucai, SUN Xili, LU Yonggang
Journal of Computer Applications    2016, 36 (2): 377-381.   DOI: 10.11772/j.issn.1001-9081.2016.02.0377
Abstract614)      PDF (781KB)(938)       Save
As linear dimensionality reduction methods usually cannot produce satisfactory low-dimensional embedding when applied to data with nonlinear structure, a new nonlinear dimensionality reduction method named NNSE was proposed to keep the local nearest neighbor information in the high-dimensional space. Firstly, the nearest neighbor points were found by calculating the Euclidean distance between the sample points in the high-dimensional space, then a random initial distribution of the data points was generated in the low-dimensional space. Secondly, by moving the data points towards the mean position of their nearest neighbors found in the high-dimensional space, the data point positions were iteratively optimized until the embedding becomes stable. In the comparison with a state-of-the-art nonlinear stochastic dimensionality reduction method named t-SNE (t-distributed Stochastic Neighbor Embedding), the low-dimensional embedding produced by NNSE method is similar to the visualization produced by the t-SNE method. However, it is shown that the NNSE method is superior to t-SNE in preserving the local nearest neighbor information in the low-dimensional embedding by using a quantitative indicator.
Reference | Related Articles | Metrics